ALIAS: Author disambiguation in Microsoft academic search engine dataset

Michael Pitts, Swapna Savvana, Senjuti Basu Roy, Vani Mandava, Dhineshkumar Prasath

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

We present a system called ALIAS, that is designed to search for duplicate authors from Microsoft Academic Search Engine dataset. Author-ambiguity is a prevalent problem in this dataset, as many authors publish under several variations of their own name, or different authors share similar or same name. ALIAS takes an author name as an input (who may or may not exist in the corpus), and outputs a set of author names from the database, that are determined as duplicates of the input author. It also provides a confidence score with each output. Additionally, ALIAS has the feature of finding a Top-k list of similar authors, given an input author name. The underlying techniques heavily rely on a mix of learning, mining, and efficient search techniques, including partitioning, clustering, supervised learning using ensemble algorithms, and indexing to perform efficient search to enable fast response for near real time user interaction. While the system is designed using Academic Search Engine data, the proposed solution is generic and could be extended to other problems in the category of entity disambiguation. In this demonstration paper, we describe different components of ALIAS and the intelligent algorithms associated with each of these components to perform author name disambiguation or similar authors finding.

Original languageEnglish (US)
Title of host publicationAdvances in Database Technology - EDBT 2014
Subtitle of host publication17th International Conference on Extending Database Technology, Proceedings
EditorsVincent Leroy, Vassilis Christophides, Vassilis Christophides, Stratos Idreos, Anastasios Kementsietsidis, Minos Garofalakis, Sihem Amer-Yahia
PublisherOpenProceedings.org, University of Konstanz, University Library
Pages648-651
Number of pages4
ISBN (Electronic)9783893180653
DOIs
StatePublished - 2014
Externally publishedYes
Event17th International Conference on Extending Database Technology, EDBT 2014 - Athens, Greece
Duration: Mar 24 2014Mar 28 2014

Publication series

NameAdvances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings

Other

Other17th International Conference on Extending Database Technology, EDBT 2014
Country/TerritoryGreece
CityAthens
Period3/24/143/28/14

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'ALIAS: Author disambiguation in Microsoft academic search engine dataset'. Together they form a unique fingerprint.

Cite this