GenerIE: Information extraction using database queries

Luis Tari, Phan Huy Tu, Jörg Hakenberg, Yi Chen, Tran Cao Son, Graciela Gonzalez, Chitta Baral

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Information extraction systems are traditionally implemented as a pipeline of special-purpose processing modules. A major drawback of such an approach is that whenever a new extraction goal emerges or a module is improved, extraction has to be re-applied from scratch to the entire text corpus even though only a small part of the corpus might be affected. In this demonstration proposal, we describe a novel paradigm for information extraction: we store the parse trees output by text processing in a database, and then express extraction needs using queries, which can be evaluated and optimized by databases. Compared with the existing approaches, database queries for information extraction enable generic extraction and minimize reprocessing. However, such an approach also poses a lot of technical challenges, such as language design, optimization and automatic query generation. We will present the opportunities and challenges that we met when building GenerIE, a system that implements this paradigm.

Original languageEnglish (US)
Title of host publication26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings
Pages1121-1124
Number of pages4
DOIs
StatePublished - 2010
Externally publishedYes
Event26th IEEE International Conference on Data Engineering, ICDE 2010 - Long Beach, CA, United States
Duration: Mar 1 2010Mar 6 2010

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Other

Other26th IEEE International Conference on Data Engineering, ICDE 2010
Country/TerritoryUnited States
CityLong Beach, CA
Period3/1/103/6/10

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Information Systems

Fingerprint

Dive into the research topics of 'GenerIE: Information extraction using database queries'. Together they form a unique fingerprint.

Cite this