Cross-validation and cross-study validation of kidney cancer with machine learning and whole exome sequences from the National Cancer Institute

Abdulrhman Aljouie, Usman Roshan, Nihir Patel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Accurate cancer risk prediction from genetic and environment variables is a key problem in medicine. One approach is to use somatic mutations which could potentially be used in early detection and prevention. SNP based studies are the most common ones utilizing this approach, however most studies lack a cross-study validation component across at least two independent studies. Here we explore the cross-validation and cross-study validation of predicting kidney cancer case and controls with SNPs obtained from whole exome sequences at the National Cancer Institute. From the Genomics Data Commons portal we obtained aligned whole exome sequences of two different kidney cancer studies: 110 cases and controls of KIRP for renal papillary cell carcinoma and 34 cases and controls of KICH for kidney chromophobe cell carcinoma. We performed a rigorous quality control procedure to obtain SNPs and rank them with feature selection. On top ranked SNPs we find the support vector machine to obtain a cross-validation accuracy of 71% (with 10 SNPs) and 72% (with 20 SNPs) in KIRP and KICH respectively. We then learn a model on KIRP and with 10 SNPs achieve an accuracy of 66% on the KICH samples. Our work shows that we can predict kidney chromophobe carcinoma from a kidney papillary carcinoma dataset with better than a random classification which would have 50% accuracy. In continuing work we are expanding these sample sizes and extending crossstudy to other kidney cancer datasets in the NCI GDC portal.

Original languageEnglish (US)
Title of host publication2018 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-6
Number of pages6
ISBN (Electronic)9781538613993
DOIs
StatePublished - Jul 5 2018
Event2018 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2018 - Saint Louis, United States
Duration: May 30 2018Jun 2 2018

Publication series

Name2018 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2018

Other

Other2018 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2018
CountryUnited States
CitySaint Louis
Period5/30/186/2/18

All Science Journal Classification (ASJC) codes

  • Agricultural and Biological Sciences (miscellaneous)
  • Artificial Intelligence
  • Computer Science Applications
  • Health Informatics

Fingerprint Dive into the research topics of 'Cross-validation and cross-study validation of kidney cancer with machine learning and whole exome sequences from the National Cancer Institute'. Together they form a unique fingerprint.

Cite this