TY - GEN
T1 - ALIAS
T2 - 17th International Conference on Extending Database Technology, EDBT 2014
AU - Pitts, Michael
AU - Savvana, Swapna
AU - Roy, Senjuti Basu
AU - Mandava, Vani
AU - Prasath, Dhineshkumar
PY - 2014
Y1 - 2014
N2 - We present a system called ALIAS, that is designed to search for duplicate authors from Microsoft Academic Search Engine dataset. Author-ambiguity is a prevalent problem in this dataset, as many authors publish under several variations of their own name, or different authors share similar or same name. ALIAS takes an author name as an input (who may or may not exist in the corpus), and outputs a set of author names from the database, that are determined as duplicates of the input author. It also provides a confidence score with each output. Additionally, ALIAS has the feature of finding a Top-k list of similar authors, given an input author name. The underlying techniques heavily rely on a mix of learning, mining, and efficient search techniques, including partitioning, clustering, supervised learning using ensemble algorithms, and indexing to perform efficient search to enable fast response for near real time user interaction. While the system is designed using Academic Search Engine data, the proposed solution is generic and could be extended to other problems in the category of entity disambiguation. In this demonstration paper, we describe different components of ALIAS and the intelligent algorithms associated with each of these components to perform author name disambiguation or similar authors finding.
AB - We present a system called ALIAS, that is designed to search for duplicate authors from Microsoft Academic Search Engine dataset. Author-ambiguity is a prevalent problem in this dataset, as many authors publish under several variations of their own name, or different authors share similar or same name. ALIAS takes an author name as an input (who may or may not exist in the corpus), and outputs a set of author names from the database, that are determined as duplicates of the input author. It also provides a confidence score with each output. Additionally, ALIAS has the feature of finding a Top-k list of similar authors, given an input author name. The underlying techniques heavily rely on a mix of learning, mining, and efficient search techniques, including partitioning, clustering, supervised learning using ensemble algorithms, and indexing to perform efficient search to enable fast response for near real time user interaction. While the system is designed using Academic Search Engine data, the proposed solution is generic and could be extended to other problems in the category of entity disambiguation. In this demonstration paper, we describe different components of ALIAS and the intelligent algorithms associated with each of these components to perform author name disambiguation or similar authors finding.
UR - http://www.scopus.com/inward/record.url?scp=84912007891&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84912007891&partnerID=8YFLogxK
U2 - 10.5441/002/edbt.2014.65
DO - 10.5441/002/edbt.2014.65
M3 - Conference contribution
AN - SCOPUS:84912007891
T3 - Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings
SP - 648
EP - 651
BT - Advances in Database Technology - EDBT 2014
A2 - Leroy, Vincent
A2 - Christophides, Vassilis
A2 - Christophides, Vassilis
A2 - Idreos, Stratos
A2 - Kementsietsidis, Anastasios
A2 - Garofalakis, Minos
A2 - Amer-Yahia, Sihem
PB - OpenProceedings.org, University of Konstanz, University Library
Y2 - 24 March 2014 through 28 March 2014
ER -