This project is addressing a systemic problem in scientific research: although datasets collected through scientific protocols may be properly stored, the protocol itself is often only recorded on paper or stored electronically as the script developed to implement the protocol. Once the scientist who has implemented the protocol leaves the laboratory, this record may be lost. Collected datasets become meaningless without a description of the process used to produce them; furthermore, the experiment designed to produce the data is not reproducible.
This research is developing a database (ProtocolDB) to manage scientific protocols and the collected datasets obtained from their execution. The approach will allow scientists to query, compare and revise protocols, and express queries across protocols and data. The research is also addressing the issue of recording and querying the provenance (the why and where) of data. ProtocolDB will benefit scientists by providing a scientific portfolio for the laboratory which not only enables querying and reasoning about protocols, executions of protocols and collected datasets, but enables data sharing and collaborations between teams.
The intellectual merit of the research includes the design of a model for scientific workflows, and a query language to retrieve, transform, compare scientific workflows, integrate datasets, and reason about data provenance. This theoretical contribution will establish advances in the development of systems supporting the expression of scientific protocols. The ProtocolDB implementation will be evaluated by our scientific partners. The broader impact resulting from the project is the development of a general-purpose system for managing scientific protocols and their collected datasets. The established collaborations, involving academic, governmental, and private institutions, will contribute significantly to the breadth of its use.
|Effective start/end date||8/15/06 → 7/31/12|
- National Science Foundation: $515,996.00