TY - GEN
T1 - GenerIE
T2 - 26th IEEE International Conference on Data Engineering, ICDE 2010
AU - Tari, Luis
AU - Tu, Phan Huy
AU - Hakenberg, Jörg
AU - Chen, Yi
AU - Son, Tran Cao
AU - Gonzalez, Graciela
AU - Baral, Chitta
PY - 2010
Y1 - 2010
N2 - Information extraction systems are traditionally implemented as a pipeline of special-purpose processing modules. A major drawback of such an approach is that whenever a new extraction goal emerges or a module is improved, extraction has to be re-applied from scratch to the entire text corpus even though only a small part of the corpus might be affected. In this demonstration proposal, we describe a novel paradigm for information extraction: we store the parse trees output by text processing in a database, and then express extraction needs using queries, which can be evaluated and optimized by databases. Compared with the existing approaches, database queries for information extraction enable generic extraction and minimize reprocessing. However, such an approach also poses a lot of technical challenges, such as language design, optimization and automatic query generation. We will present the opportunities and challenges that we met when building GenerIE, a system that implements this paradigm.
AB - Information extraction systems are traditionally implemented as a pipeline of special-purpose processing modules. A major drawback of such an approach is that whenever a new extraction goal emerges or a module is improved, extraction has to be re-applied from scratch to the entire text corpus even though only a small part of the corpus might be affected. In this demonstration proposal, we describe a novel paradigm for information extraction: we store the parse trees output by text processing in a database, and then express extraction needs using queries, which can be evaluated and optimized by databases. Compared with the existing approaches, database queries for information extraction enable generic extraction and minimize reprocessing. However, such an approach also poses a lot of technical challenges, such as language design, optimization and automatic query generation. We will present the opportunities and challenges that we met when building GenerIE, a system that implements this paradigm.
UR - http://www.scopus.com/inward/record.url?scp=77952779235&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77952779235&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2010.5447773
DO - 10.1109/ICDE.2010.5447773
M3 - Conference contribution
AN - SCOPUS:77952779235
SN - 9781424454440
T3 - Proceedings - International Conference on Data Engineering
SP - 1121
EP - 1124
BT - 26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings
Y2 - 1 March 2010 through 6 March 2010
ER -