TY - GEN
T1 - Using Generative Large Language Models for Hierarchical Relationship Prediction in Medical Ontologies
AU - Liu, Hao
AU - Zhou, Shuxin
AU - Chen, Zhehuan
AU - Perl, Yehoshua
AU - Wang, Jiayin
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - This study extends the exploration of ontology enrichment by evaluating the performance of various open-sourced Large Language Models (LLMs) on the task of predicting hierarchical relationships (IS-A) in medical ontologies including SNOMED CT Clinical Finding and Procedure hierarchies and the human Disease Ontology. With the previous finetuned BERT models for hierarchical relationship prediction as the baseline, we assessed eight open-source generative LLMs for the same task. We observed only three models, without finetuning, demonstrated comparable or superior performance compared to the baseline BERT -based models. The best performance model OpenChat achieved a macro average F1 score of 0.96 (0.95) on SNOMED CT Clinical Finding (Procedure) hierarchy, an increase over 7% from the baseline 0.89 (0.85). On human Disease Ontology, OpenChat excels with an F1 score of 0.91, outperforming the second-best performance model Vicuna (0.84). Notably, some LLMs prove unsuitable for hierarchical relationship prediction tasks or appliable for concept placement of medical ontologies. We also explored various prompt templates and ensemble techniques to uncover potential confounding factors in applying LLMs for IS-A relation predictions for medical ontologies.
AB - This study extends the exploration of ontology enrichment by evaluating the performance of various open-sourced Large Language Models (LLMs) on the task of predicting hierarchical relationships (IS-A) in medical ontologies including SNOMED CT Clinical Finding and Procedure hierarchies and the human Disease Ontology. With the previous finetuned BERT models for hierarchical relationship prediction as the baseline, we assessed eight open-source generative LLMs for the same task. We observed only three models, without finetuning, demonstrated comparable or superior performance compared to the baseline BERT -based models. The best performance model OpenChat achieved a macro average F1 score of 0.96 (0.95) on SNOMED CT Clinical Finding (Procedure) hierarchy, an increase over 7% from the baseline 0.89 (0.85). On human Disease Ontology, OpenChat excels with an F1 score of 0.91, outperforming the second-best performance model Vicuna (0.84). Notably, some LLMs prove unsuitable for hierarchical relationship prediction tasks or appliable for concept placement of medical ontologies. We also explored various prompt templates and ensemble techniques to uncover potential confounding factors in applying LLMs for IS-A relation predictions for medical ontologies.
KW - Hieratical Relation Prediction
KW - Large Language Models
KW - Medical Ontology
KW - Prompts Design
KW - SNOMED CT
UR - http://www.scopus.com/inward/record.url?scp=85203690166&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85203690166&partnerID=8YFLogxK
U2 - 10.1109/ICHI61247.2024.00040
DO - 10.1109/ICHI61247.2024.00040
M3 - Conference contribution
AN - SCOPUS:85203690166
T3 - Proceedings - 2024 IEEE 12th International Conference on Healthcare Informatics, ICHI 2024
SP - 248
EP - 256
BT - Proceedings - 2024 IEEE 12th International Conference on Healthcare Informatics, ICHI 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th IEEE International Conference on Healthcare Informatics, ICHI 2024
Y2 - 3 June 2024 through 6 June 2024
ER -