Existing workflow management systems assume that scientists have a well-specified workflow design before the execution. In reality, a lot of scientific discoveries are made as a result of a dynamic process, where scientists keep proposing new hypotheses and verifying them through multiple tries of various experiments before achieving successful experimental results. Consequently, not all the experiments in a workflow execution have necessarily contributed to the final result. In this paper, we investigate the problem of effectively reproducing the results of previous scientific workflow executions by discovering the critical experiments leading to the success and the logical constraints on their execution order. Relational schema and SQL queries have been designed for effectively recording the workflow execution log, efficiently identifying the critical experiments from the log, and recommending experiment reproduction strategies to users. Furthermore, we propose optimization techniques for evaluating such SQL queries according to the unique characteristics of the log data. Experimental evaluations demonstrate the performance speedup of our approach.
All Science Journal Classification (ASJC) codes
- Hardware and Architecture
- Computer Networks and Communications
- Database system
- Join algorithm
- Scientific workflow