Cross-validation and cross-study validation of chronic lymphocytic leukemia with exome sequences and machine learning

Nihir Patel, Bharati Jhadav, Abdulrhman Aljouie, Usman Roshan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

The era of genomics brings the potential of better DNA based risk prediction and treatment. While genome-wide association studies are extensively studied for risk prediction, the potential of using whole exome data for this purpose is unclear. We explore this problem for chronic lymphocytic leukemia that is one of the largest whole exome dataset of 186 case and 169 controls available from the NIH dbGaP database. We perform a standard next generation sequence procedure to obtain SNP variants on 153 cases and 144 controls after exclusion of samples with missing data. To evaluate their predictive power we first conduct a 50% training and 50% test cross-validation study on the full dataset with the support vector machine as the classifier. There we obtain a mean accuracy of 82% with top 20 ranked SNPs obtained by the Pearson correlation coefficient. We then perform a cross-study validation on case and controls from a lymphoma external study and just controls from head and neck cancer and breast cancer studies (all obtained from NIH dbGaP). On the external dataset we obtain an accuracy of 70% with top ranked SNPs obtained from the original dataset. We also find our top Pearson ranked SNPs to lie on previously implicated genes for this disease. Our study shows that even with a small sample size we can obtain moderate to high accuracy with exome sequences and is thus encouraging for future work.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015
Editorslng. Matthieu Schapranow, Jiayu Zhou, Xiaohua Tony Hu, Bin Ma, Sanguthevar Rajasekaran, Satoru Miyano, Illhoi Yoo, Brian Pierce, Amarda Shehu, Vijay K. Gombar, Brian Chen, Vinay Pai, Jun Huan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1367-1374
Number of pages8
ISBN (Electronic)9781467367981
DOIs
StatePublished - Dec 16 2015
EventIEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015 - Washington, United States
Duration: Nov 9 2015Nov 12 2015

Publication series

NameProceedings - 2015 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015

Other

OtherIEEE International Conference on Bioinformatics and Biomedicine, BIBM 2015
CountryUnited States
CityWashington
Period11/9/1511/12/15

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence
  • Health Informatics
  • Biomedical Engineering

Fingerprint Dive into the research topics of 'Cross-validation and cross-study validation of chronic lymphocytic leukemia with exome sequences and machine learning'. Together they form a unique fingerprint.

Cite this