Histogram difference string distance for enhancing ontology integration in bioinformatics

Alex Rudniy, James Geller, Min Song

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Integration of bioinformatics ontologies is an important research task. This paper presents a family of new methods of string distance computation for improving existing ontology integration and alignment techniques. A histogram, the main tool of the introduced methods, is an associative array for storing the number of occurrences of each character in a string. We use histogram difference in combination with Longest Common Prefix, TFIDF, Smith-Waterman, and Jaccard re-scorers to define the four members of our family of string matching methods. We compare the performance of our methods with several well-known string matching algorithms using five Gene Ontology datasets as test beds. Our methods outperformed those algorithms in terms of average precision on four datasets and for maximum F1 measure on three datasets. On the remaining datasets our results were among the best, compared to these well-known methods.

Original languageEnglish (US)
Title of host publication4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012
Pages108-113
Number of pages6
StatePublished - 2012
Event4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012 - Las Vegas, NV, United States
Duration: Mar 12 2012Mar 14 2012

Publication series

Name4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012

Other

Other4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012
Country/TerritoryUnited States
CityLas Vegas, NV
Period3/12/123/14/12

All Science Journal Classification (ASJC) codes

  • Biomedical Engineering
  • Health Information Management

Fingerprint

Dive into the research topics of 'Histogram difference string distance for enhancing ontology integration in bioinformatics'. Together they form a unique fingerprint.

Cite this