TY - JOUR
T1 - Transfer Learning from BERT to Support Insertion of New Concepts into SNOMED CT
AU - Liu, Hao
AU - Perl, Yehoshua
AU - Geller, James
N1 - Publisher Copyright:
©2019 AMIA - All rights reserved.
PY - 2019
Y1 - 2019
N2 - With advances in Machine Learning (ML), neural network-based methods, such as Convolutional/Recurrent Neural Networks, have been proposed to assist terminology curators in the development and maintenance of terminologies. Bidirectional Encoder Representations from Transformers (BERT), a new language representation model, obtains state-of-the-art results on a wide array of general English NLP tasks. We explore BERT's applicability to medical terminology-related tasks. Utilizing the "next sentence prediction" capability of BERT, we show that the Fine-tuning strategy of Transfer Learning (TL) from the BERTBASE model can address a challenging problem in automatic terminology enrichment - insertion of new concepts. Adding a pre-training strategy enhances the results. We apply our strategies to the two largest hierarchies of SNOMED CT, with one release as training data and the following release as test data. The performance of the combined two proposed TL models achieves an average F1 score of 0.85 and 0.86 for the two hierarchies, respectively.
AB - With advances in Machine Learning (ML), neural network-based methods, such as Convolutional/Recurrent Neural Networks, have been proposed to assist terminology curators in the development and maintenance of terminologies. Bidirectional Encoder Representations from Transformers (BERT), a new language representation model, obtains state-of-the-art results on a wide array of general English NLP tasks. We explore BERT's applicability to medical terminology-related tasks. Utilizing the "next sentence prediction" capability of BERT, we show that the Fine-tuning strategy of Transfer Learning (TL) from the BERTBASE model can address a challenging problem in automatic terminology enrichment - insertion of new concepts. Adding a pre-training strategy enhances the results. We apply our strategies to the two largest hierarchies of SNOMED CT, with one release as training data and the following release as test data. The performance of the combined two proposed TL models achieves an average F1 score of 0.85 and 0.86 for the two hierarchies, respectively.
UR - http://www.scopus.com/inward/record.url?scp=85083755680&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083755680&partnerID=8YFLogxK
M3 - Article
C2 - 32308910
AN - SCOPUS:85083755680
SN - 1559-4076
VL - 2019
SP - 1129
EP - 1138
JO - AMIA ... Annual Symposium proceedings. AMIA Symposium
JF - AMIA ... Annual Symposium proceedings. AMIA Symposium
ER -