TY - JOUR
T1 - Relating complexity and error rates of ontology concepts
T2 - More complex NCIt concepts have more errors
AU - Min, Hua
AU - Zheng, Ling
AU - Perl, Yehoshua
AU - Halper, Michael
AU - De Coronado, Sherri
AU - Ochs, Christopher
N1 - Funding Information:
Research reported in this publication was partially supported by the National Cancer Institute of the National Institutes of Health under Award Number R01CA190779. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Publisher Copyright:
© Schattauer 2017.
PY - 2017
Y1 - 2017
N2 - Objectives: Ontologies are knowledge structures that lend support to many health-information systems. A study is carried out to assess the quality of ontological concepts based on a measure of their complexity. The results show a relation between complexity of concepts and error rates of concepts. Methods: A measure of lateral complexity defined as the number of exhibited role types is used to distinguish between more complex and simpler concepts. Using a framework called an area taxonomy, a kind of abstraction network that summarizes the structural organization of an ontology, concepts are divided into two groups along these lines. Various concepts from each group are then subjected to a two-phase QA analysis to uncover and verify errors and inconsistencies in their modeling. A hierarchy of the National Cancer Institute thesaurus (NCIt) is used as our test-bed. A hypothesis pertaining to the expected error rates of the complex and simple concepts is tested. Results: Our study was done on the NCIt’s Biological Process hierarchy. Various errors, including missing roles, incorrect role targets, and incorrectly assigned roles, were discovered and verified in the two phases of our QA analysis. The overall findings confirmed our hypothesis by showing a statistically significant difference between the amounts of errors exhibited by more laterally complex concepts vis-à-vis simpler concepts. Conclusions: QA is an essential part of any ontology’s maintenance regimen. In this paper, we reported on the results of a QA study targeting two groups of ontology concepts distinguished by their level of complexity, defined in terms of the number of exhibited role types. The study was carried out on a major component of an important ontology, the NCIt. The findings suggest that more complex concepts tend to have a higher error rate than simpler concepts. These findings can be utilized to guide ongoing efforts in ontology QA.
AB - Objectives: Ontologies are knowledge structures that lend support to many health-information systems. A study is carried out to assess the quality of ontological concepts based on a measure of their complexity. The results show a relation between complexity of concepts and error rates of concepts. Methods: A measure of lateral complexity defined as the number of exhibited role types is used to distinguish between more complex and simpler concepts. Using a framework called an area taxonomy, a kind of abstraction network that summarizes the structural organization of an ontology, concepts are divided into two groups along these lines. Various concepts from each group are then subjected to a two-phase QA analysis to uncover and verify errors and inconsistencies in their modeling. A hierarchy of the National Cancer Institute thesaurus (NCIt) is used as our test-bed. A hypothesis pertaining to the expected error rates of the complex and simple concepts is tested. Results: Our study was done on the NCIt’s Biological Process hierarchy. Various errors, including missing roles, incorrect role targets, and incorrectly assigned roles, were discovered and verified in the two phases of our QA analysis. The overall findings confirmed our hypothesis by showing a statistically significant difference between the amounts of errors exhibited by more laterally complex concepts vis-à-vis simpler concepts. Conclusions: QA is an essential part of any ontology’s maintenance regimen. In this paper, we reported on the results of a QA study targeting two groups of ontology concepts distinguished by their level of complexity, defined in terms of the number of exhibited role types. The study was carried out on a major component of an important ontology, the NCIt. The findings suggest that more complex concepts tend to have a higher error rate than simpler concepts. These findings can be utilized to guide ongoing efforts in ontology QA.
KW - Abstraction network
KW - National cancer institute thesaurus
KW - Ontology complexity
KW - Ontology modeling
KW - Ontology quality assurance
UR - http://www.scopus.com/inward/record.url?scp=85019940285&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85019940285&partnerID=8YFLogxK
U2 - 10.3414/ME16-01-0085
DO - 10.3414/ME16-01-0085
M3 - Article
C2 - 28244549
AN - SCOPUS:85019940285
SN - 0026-1270
VL - 56
SP - 200
EP - 208
JO - Methods of Information in Medicine
JF - Methods of Information in Medicine
IS - 3
ER -