TY - JOUR
T1 - Complex overlapping concepts
T2 - An effective auditing methodology for families of similarly structured BioPortal ontologies
AU - Zheng, Ling
AU - Chen, Yan
AU - Elhanan, Gai
AU - Perl, Yehoshua
AU - Geller, James
AU - Ochs, Christopher
N1 - Publisher Copyright:
© 2018 Elsevier Inc.
PY - 2018/7
Y1 - 2018/7
N2 - In previous research, we have demonstrated for a number of ontologies that structurally complex concepts (for different definitions of “complex”) in an ontology are more likely to exhibit errors than other concepts. Thus, such complex concepts often become fertile ground for quality assurance (QA) in ontologies. They should be audited first. One example of complex concepts is given by “overlapping concepts” (to be defined below.) Historically, a different auditing methodology had to be developed for every single ontology. For better scalability and efficiency, it is desirable to identify family-wide QA methodologies. Each such methodology would be applicable to a whole family of similar ontologies. In past research, we had divided the 685 ontologies of BioPortal into families of structurally similar ontologies. We showed for four ontologies of the same large family in BioPortal that “overlapping concepts” are indeed statistically significantly more likely to exhibit errors. In order to make an authoritative statement concerning the success of “overlapping concepts” as a methodology for a whole family of similar ontologies (or of large subhierarchies of ontologies), it is necessary to show that “overlapping concepts” have a higher likelihood of errors for six out of six ontologies of the family. In this paper, we are demonstrating for two more ontologies that “overlapping concepts” can successfully predict groups of concepts with a higher error rate than concepts from a control group. The fifth ontology is the Neoplasm subhierarchy of the National Cancer Institute thesaurus (NCIt). The sixth ontology is the Infectious Disease subhierarchy of SNOMED CT. We demonstrate quality assurance results for both of them. Furthermore, in this paper we observe two novel, important, and useful phenomena during quality assurance of “overlapping concepts.” First, an erroneous “overlapping concept” can help with discovering other erroneous “non-overlapping concepts” in its vicinity. Secondly, correcting erroneous “overlapping concepts” may turn them into “non-overlapping concepts.” We demonstrate that this may reduce the complexity of parts of the ontology, which in turn makes the ontology more comprehensible, simplifying maintenance and use of the ontology.
AB - In previous research, we have demonstrated for a number of ontologies that structurally complex concepts (for different definitions of “complex”) in an ontology are more likely to exhibit errors than other concepts. Thus, such complex concepts often become fertile ground for quality assurance (QA) in ontologies. They should be audited first. One example of complex concepts is given by “overlapping concepts” (to be defined below.) Historically, a different auditing methodology had to be developed for every single ontology. For better scalability and efficiency, it is desirable to identify family-wide QA methodologies. Each such methodology would be applicable to a whole family of similar ontologies. In past research, we had divided the 685 ontologies of BioPortal into families of structurally similar ontologies. We showed for four ontologies of the same large family in BioPortal that “overlapping concepts” are indeed statistically significantly more likely to exhibit errors. In order to make an authoritative statement concerning the success of “overlapping concepts” as a methodology for a whole family of similar ontologies (or of large subhierarchies of ontologies), it is necessary to show that “overlapping concepts” have a higher likelihood of errors for six out of six ontologies of the family. In this paper, we are demonstrating for two more ontologies that “overlapping concepts” can successfully predict groups of concepts with a higher error rate than concepts from a control group. The fifth ontology is the Neoplasm subhierarchy of the National Cancer Institute thesaurus (NCIt). The sixth ontology is the Infectious Disease subhierarchy of SNOMED CT. We demonstrate quality assurance results for both of them. Furthermore, in this paper we observe two novel, important, and useful phenomena during quality assurance of “overlapping concepts.” First, an erroneous “overlapping concept” can help with discovering other erroneous “non-overlapping concepts” in its vicinity. Secondly, correcting erroneous “overlapping concepts” may turn them into “non-overlapping concepts.” We demonstrate that this may reduce the complexity of parts of the ontology, which in turn makes the ontology more comprehensible, simplifying maintenance and use of the ontology.
KW - Abstraction network
KW - Family-based ontology quality assurance
KW - National Cancer Institute thesaurus
KW - Ontology auditing
KW - Ontology quality assurance
KW - SNOMED CT
UR - http://www.scopus.com/inward/record.url?scp=85048256040&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048256040&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2018.05.015
DO - 10.1016/j.jbi.2018.05.015
M3 - Article
C2 - 29852316
AN - SCOPUS:85048256040
SN - 1532-0464
VL - 83
SP - 135
EP - 149
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
ER -