The UMLS contains terms from many sources. Every update of a source requires reintegration. Each new term needs to be assigned to a preexisting UMLS concept, or a new concept must be created. Whenever the integration process unnecessarily creates a new concept, this is undesirable. We report on a method to detect such undesirable duplicate concepts. Terms are removed from the UMLS and reintegrated using "piecewise synonym generation." The concept of the reintegrated term is programmatically compared to the initial concept of the term (before removal). If they are different, this indicates an error, either in the integration process or in the initial concept. Thus, such a term-concept pair is deemed suspicious. A study of five hierarchies of the SNOMED found 7.7% suspicious matches. A human expert needs to evaluate the correctness of suspicious concepts. In a sample of 149 of those, 19% of concepts were found to be duplicates.
|Original language||English (US)|
|Number of pages||5|
|Journal||AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium|
|State||Published - Jan 1 2010|
All Science Journal Classification (ASJC) codes