Shortest Path Edit Distance for detecting duplicate biological entities

Alex Rudniy, Min Song, James Geller

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

This paper presents a novel and context-sensitive Shortest Path Edit Distance (SPED) applied to duplicate entity detection in biological data. SPED is an extension of Markov Random Field-based Edit Distance. It transforms the edit distance computational problem to the calculation of the shortest path among two selected vertices of a graph. The experimental results show that SPED produces competitive outcomes. Soft-SPED, the combination of SPED with TFIDF, achieves superior performance in most cases.

Original languageEnglish (US)
Title of host publication2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010
Pages442-444
Number of pages3
DOIs
StatePublished - Oct 25 2010
Event2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010 - Niagara Falls, NY, United States
Duration: Aug 2 2010Aug 4 2010

Publication series

Name2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010

Other

Other2010 ACM International Conference on Bioinformatics and Computational Biology, ACM-BCB 2010
CountryUnited States
CityNiagara Falls, NY
Period8/2/108/4/10

All Science Journal Classification (ASJC) codes

  • Biomedical Engineering
  • Health Information Management

Keywords

  • Algorithm
  • Biological entities
  • Duplicate detection
  • Edit distance

Cite this