TY - GEN
T1 - Histogram difference string distance for enhancing ontology integration in bioinformatics
AU - Rudniy, Alex
AU - Geller, James
AU - Song, Min
PY - 2012
Y1 - 2012
N2 - Integration of bioinformatics ontologies is an important research task. This paper presents a family of new methods of string distance computation for improving existing ontology integration and alignment techniques. A histogram, the main tool of the introduced methods, is an associative array for storing the number of occurrences of each character in a string. We use histogram difference in combination with Longest Common Prefix, TFIDF, Smith-Waterman, and Jaccard re-scorers to define the four members of our family of string matching methods. We compare the performance of our methods with several well-known string matching algorithms using five Gene Ontology datasets as test beds. Our methods outperformed those algorithms in terms of average precision on four datasets and for maximum F1 measure on three datasets. On the remaining datasets our results were among the best, compared to these well-known methods.
AB - Integration of bioinformatics ontologies is an important research task. This paper presents a family of new methods of string distance computation for improving existing ontology integration and alignment techniques. A histogram, the main tool of the introduced methods, is an associative array for storing the number of occurrences of each character in a string. We use histogram difference in combination with Longest Common Prefix, TFIDF, Smith-Waterman, and Jaccard re-scorers to define the four members of our family of string matching methods. We compare the performance of our methods with several well-known string matching algorithms using five Gene Ontology datasets as test beds. Our methods outperformed those algorithms in terms of average precision on four datasets and for maximum F1 measure on three datasets. On the remaining datasets our results were among the best, compared to these well-known methods.
UR - http://www.scopus.com/inward/record.url?scp=84883638335&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84883638335&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84883638335
SN - 9781618397461
T3 - 4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012
SP - 108
EP - 113
BT - 4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012
T2 - 4th International Conference on Bioinformatics and Computational Biology 2012, BICoB 2012
Y2 - 12 March 2012 through 14 March 2012
ER -