TY - JOUR
T1 - A comparative analysis of the density of the SNOMED CT conceptual content for semantic harmonization
AU - He, Zhe
AU - Geller, James
AU - Chen, Yan
N1 - Funding Information:
We want to thank Drs. Yehoshua Perl, Michael Halper, Mei Liu, Gai Elhanan, and Chunhua Weng for providing their feedback and sharing their insights for this work. We also want to thank three anonymous referees for providing comprehensive comments that have significantly improved the quality of this manuscript. This work was partially supported by the U.S. National Cancer Institute of the National Institutes of Health under award number R01CA190779. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Publisher Copyright:
© 2015 Elsevier B.V..
PY - 2015/5/1
Y1 - 2015/5/1
N2 - Objectives: Medical terminologies vary in the amount of concept information (the "density") represented, even in the same sub-domains. This causes problems in terminology mapping, semantic harmonization and terminology integration. Moreover, complex clinical scenarios need to be encoded by a medical terminology with comprehensive content. SNOMED Clinical Terms (SNOMED CT), a leading clinical terminology, was reported to lack concepts and synonyms, problems that cannot be fully alleviated by using post-coordination. Therefore, a scalable solution is needed to enrich the conceptual content of SNOMED CT. We are developing a structure-based, algorithmic method to identify potential concepts for enriching the conceptual content of SNOMED CT and to support semantic harmonization of SNOMED CT with selected other Unified Medical Language System (UMLS) terminologies. Methods: We first identified a subset of English terminologies in the UMLS that have 'PAR' relationship labeled with 'IS_A' and over 10% overlap with one or more of the 19 hierarchies of SNOMED CT. We call these "reference terminologies" and we note that our use of this name is different from the standard use. Next, we defined a set of topological patterns across pairs of terminologies, with SNOMED CT being one terminology in each pair and the other being one of the reference terminologies. We then explored how often these topological patterns appear between SNOMED CT and each reference terminology, and how to interpret them. Results: Four viable reference terminologies were identified. Large density differences between terminologies were found. Expected interpretations of these differences were indeed observed, as follows. A random sample of 299 instances of special topological patterns ("2:3 and 3:2 trapezoids") showed that 39.1% and 59.5% of analyzed concepts in SNOMED CT and in a reference terminology, respectively, were deemed to be alternative classifications of the same conceptual content. In 30.5% and 17.6% of the cases, it was found that intermediate concepts could be imported into SNOMED CT or into the reference terminology, respectively, to enhance their conceptual content, if approved by a human curator. Other cases included synonymy and errors in one of the terminologies. Conclusion: These results show that structure-based algorithmic methods can be used to identify potential concepts to enrich SNOMED CT and the four reference terminologies. The comparative analysis has the future potential of supporting terminology authoring by suggesting new content to improve content coverage and semantic harmonization between terminologies.
AB - Objectives: Medical terminologies vary in the amount of concept information (the "density") represented, even in the same sub-domains. This causes problems in terminology mapping, semantic harmonization and terminology integration. Moreover, complex clinical scenarios need to be encoded by a medical terminology with comprehensive content. SNOMED Clinical Terms (SNOMED CT), a leading clinical terminology, was reported to lack concepts and synonyms, problems that cannot be fully alleviated by using post-coordination. Therefore, a scalable solution is needed to enrich the conceptual content of SNOMED CT. We are developing a structure-based, algorithmic method to identify potential concepts for enriching the conceptual content of SNOMED CT and to support semantic harmonization of SNOMED CT with selected other Unified Medical Language System (UMLS) terminologies. Methods: We first identified a subset of English terminologies in the UMLS that have 'PAR' relationship labeled with 'IS_A' and over 10% overlap with one or more of the 19 hierarchies of SNOMED CT. We call these "reference terminologies" and we note that our use of this name is different from the standard use. Next, we defined a set of topological patterns across pairs of terminologies, with SNOMED CT being one terminology in each pair and the other being one of the reference terminologies. We then explored how often these topological patterns appear between SNOMED CT and each reference terminology, and how to interpret them. Results: Four viable reference terminologies were identified. Large density differences between terminologies were found. Expected interpretations of these differences were indeed observed, as follows. A random sample of 299 instances of special topological patterns ("2:3 and 3:2 trapezoids") showed that 39.1% and 59.5% of analyzed concepts in SNOMED CT and in a reference terminology, respectively, were deemed to be alternative classifications of the same conceptual content. In 30.5% and 17.6% of the cases, it was found that intermediate concepts could be imported into SNOMED CT or into the reference terminology, respectively, to enhance their conceptual content, if approved by a human curator. Other cases included synonymy and errors in one of the terminologies. Conclusion: These results show that structure-based algorithmic methods can be used to identify potential concepts to enrich SNOMED CT and the four reference terminologies. The comparative analysis has the future potential of supporting terminology authoring by suggesting new content to improve content coverage and semantic harmonization between terminologies.
KW - Biomedical terminology
KW - SNOMED CT
KW - Semantic harmonization
KW - Semantic interoperability
KW - Structural methodology
KW - UMLS
UR - http://www.scopus.com/inward/record.url?scp=84930084847&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84930084847&partnerID=8YFLogxK
U2 - 10.1016/j.artmed.2015.03.002
DO - 10.1016/j.artmed.2015.03.002
M3 - Article
C2 - 25890688
AN - SCOPUS:84930084847
SN - 0933-3657
VL - 64
SP - 29
EP - 40
JO - Artificial Intelligence in Medicine
JF - Artificial Intelligence in Medicine
IS - 1
ER -