Discovering additional complex NCIt gene concepts with high error rate

Ling Zheng, Hua Min, Yehoshua Perl, James Geller

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

The Gene hierarchy of the National Cancer Institute (NCI) Thesaurus (NCIt) is of high priority for NCI. It is important to have quality assurance (QA) techniques to improve its content quality. We present a two-step methodology concentrating on auditing the modeling of complex concepts, which are shown to have a higher error rate compared to control concepts. In the first step, we test whether concepts that appear complex in a so called 'partial-area taxonomy' have a higher error rate than control concepts. In the second step, we introduce an innovative technique based on a 'partial-area sub-taxonomy' (constructed with a subset of roles) to discover additional complex concepts. The results of the QA study show that these concepts are indeed statistically significantly more likely to have more errors than control concepts. This makes it easier for NCI staff to improve the modeling quality of gene concepts in NCIt.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
EditorsIllhoi Yoo, Jane Huiru Zheng, Yang Gong, Xiaohua Tony Hu, Chi-Ren Shyu, Yana Bromberg, Jean Gao, Dmitry Korkin
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages653-657
Number of pages5
ISBN (Electronic)9781509030491
DOIs
StatePublished - Dec 15 2017
Event2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017 - Kansas City, United States
Duration: Nov 13 2017Nov 16 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
Volume2017-January

Other

Other2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
CountryUnited States
CityKansas City
Period11/13/1711/16/17

All Science Journal Classification (ASJC) codes

  • Biomedical Engineering
  • Health Informatics

Keywords

  • Gene hierarchy
  • National Cancer Institute Thesaurus
  • auditing software
  • complex concepts
  • quality assurance

Fingerprint Dive into the research topics of 'Discovering additional complex NCIt gene concepts with high error rate'. Together they form a unique fingerprint.

Cite this