Abstract
Existing workflow management systems assume that scientists have a well-specified workflow design before the execution. In reality, a lot of scientific discoveries are made as a result of a dynamic process, where scientists keep proposing new hypotheses and verifying them through multiple tries of various experiments before achieving successful experimental results. Consequently, not all the experiments in a workflow execution have necessarily contributed to the final result. In this paper, we investigate the problem of effectively reproducing the results of previous scientific workflow executions by discovering the critical experiments leading to the success and the logical constraints on their execution order. Relational schema and SQL queries have been designed for effectively recording the workflow execution log, efficiently identifying the critical experiments from the log, and recommending experiment reproduction strategies to users. Furthermore, we propose optimization techniques for evaluating such SQL queries according to the unique characteristics of the log data. Experimental evaluations demonstrate the performance speedup of our approach.
Original language | English (US) |
---|---|
Pages (from-to) | 577-585 |
Number of pages | 9 |
Journal | Future Generation Computer Systems |
Volume | 25 |
Issue number | 5 |
DOIs | |
State | Published - May 2009 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Software
- Hardware and Architecture
- Computer Networks and Communications
Keywords
- Database system
- Join algorithm
- Log
- Scientific workflow