TY - JOUR
T1 - EliXR
T2 - An approach to eligibility criteria extraction and representation
AU - Weng, Chunhua
AU - Wu, Xiaoying
AU - Luo, Zhihui
AU - Boland, Mary Regina
AU - Theodoratos, Dimitri
AU - Johnson, Stephen B.
PY - 2011/12
Y1 - 2011/12
N2 - Objective: To develop a semantic representation for clinical research eligibility criteria to automate semistructured information extraction from eligibility criteria text. Materials and Methods: An analysis pipeline called eligibility criteria extraction and representation (EliXR) was developed that integrates syntactic parsing and tree pattern mining to discover common semantic patterns in 1000 eligibility criteria randomly selected from http://ClinicalTrials.gov. The semantic patterns were aggregated and enriched with unified medical language systems semantic knowledge to form a semantic representation for clinical research eligibility criteria. Results: The authors arrived at 175 semantic patterns, which form 12 semantic role labels connected by their frequent semantic relations in a semantic network. Evaluation: Three raters independently annotated all the sentence segments (N=396) for 79 test eligibility criteria using the 12 top-level semantic role labels. Eightsix per cent (339) of the sentence segments were unanimously labelled correctly and 13.8% (55) were correctly labelled by two raters. The Fleiss' κ was 0.88, indicating a nearly perfect interrater agreement. Conclusion: This study present a semi-automated datadriven approach to developing a semantic network that aligns well with the top-level information structure in clinical research eligibility criteria text and demonstrates the feasibility of using the resulting semantic role labels to generate semistructured eligibility criteria with nearly perfect interrater reliability.
AB - Objective: To develop a semantic representation for clinical research eligibility criteria to automate semistructured information extraction from eligibility criteria text. Materials and Methods: An analysis pipeline called eligibility criteria extraction and representation (EliXR) was developed that integrates syntactic parsing and tree pattern mining to discover common semantic patterns in 1000 eligibility criteria randomly selected from http://ClinicalTrials.gov. The semantic patterns were aggregated and enriched with unified medical language systems semantic knowledge to form a semantic representation for clinical research eligibility criteria. Results: The authors arrived at 175 semantic patterns, which form 12 semantic role labels connected by their frequent semantic relations in a semantic network. Evaluation: Three raters independently annotated all the sentence segments (N=396) for 79 test eligibility criteria using the 12 top-level semantic role labels. Eightsix per cent (339) of the sentence segments were unanimously labelled correctly and 13.8% (55) were correctly labelled by two raters. The Fleiss' κ was 0.88, indicating a nearly perfect interrater agreement. Conclusion: This study present a semi-automated datadriven approach to developing a semantic network that aligns well with the top-level information structure in clinical research eligibility criteria text and demonstrates the feasibility of using the resulting semantic role labels to generate semistructured eligibility criteria with nearly perfect interrater reliability.
UR - http://www.scopus.com/inward/record.url?scp=84863557848&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863557848&partnerID=8YFLogxK
U2 - 10.1136/amiajnl-2011-000321
DO - 10.1136/amiajnl-2011-000321
M3 - Article
C2 - 21807647
AN - SCOPUS:84863557848
SN - 1067-5027
VL - 18
SP - 116
EP - 124
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - SUPPL. 1
ER -