TY - GEN
T1 - On provenance and privacy
AU - Davidson, Susan B.
AU - Khanna, Sanjeev
AU - Roy, Sudeepa
AU - Stoyanovich, Julia
AU - Tannen, Val
AU - Chen, Yi
PY - 2011
Y1 - 2011
N2 - Provenance in scientific workflows is a double-edged sword. On the one hand, recording information about the module executions used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions, enables transparency and repro-ducibility of results. On the other hand, a scientific workflow often contains private or confidential data and uses proprietary modules. Hence, providing exact answers to provenance queries over all executions of the workflow may reveal private information. In this paper we discuss privacy concerns in scientific workflows - data, module, and structural privacy - and frame several natural questions: (i) Can we formally analyze data, module, and structural privacy, giving provable privacy guarantees for an unlimited/bounded number of provenance queries? (ii) How can we answer search and structural queries over repositories of workflow specifications and their executions, providing as much information as possible to the user while still guaranteeing privacy? We then highlight some recent work in this area and point to several directions for future work.
AB - Provenance in scientific workflows is a double-edged sword. On the one hand, recording information about the module executions used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions, enables transparency and repro-ducibility of results. On the other hand, a scientific workflow often contains private or confidential data and uses proprietary modules. Hence, providing exact answers to provenance queries over all executions of the workflow may reveal private information. In this paper we discuss privacy concerns in scientific workflows - data, module, and structural privacy - and frame several natural questions: (i) Can we formally analyze data, module, and structural privacy, giving provable privacy guarantees for an unlimited/bounded number of provenance queries? (ii) How can we answer search and structural queries over repositories of workflow specifications and their executions, providing as much information as possible to the user while still guaranteeing privacy? We then highlight some recent work in this area and point to several directions for future work.
KW - Privacy
KW - Provenance
KW - Scientific workflows
UR - http://www.scopus.com/inward/record.url?scp=79952335883&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79952335883&partnerID=8YFLogxK
U2 - 10.1145/1938551.1938554
DO - 10.1145/1938551.1938554
M3 - Conference contribution
AN - SCOPUS:79952335883
SN - 9781450305297
T3 - ACM International Conference Proceeding Series
SP - 3
EP - 10
BT - Database Theory - ICDT 2011
PB - Association for Computing Machinery
ER -