Machine learning based prediction of gliomas with germline mutations obtained from whole exome sequences from TCGA and 1000 Genomes Project

Abdulrhman Aljouie, Michael Schatz, Usman Roshan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Germline variants can be early useful predictors of cancer risk. Here we present cross-study validation and cross-validation of two brain cancers: Gliobastoma Multiforme (GBM) and Lower Grade Glioma (LGG). We obtained whole exome germline sequences of European ancestry individuals with these cancers from The Cancer Genome Atlas and of European ancestry control individuals from the 1000 Genomes Project. We performed a rigorous quality controlled GATK procedure to obtain variants with which we perform cross-study and crossvalidation experiments. We find our germline variants to be highly predictive of both cancers in cross-study as well as in crossvalidation. Predicting LGG+controls from GBM+controls gives an 89% accuracy and predicting vice versa is 88% accurate both with the linear support vector machine classifier. We find that the main bulk of accuracy comes from the SNP rs10792053 that lies on gene OR9G1. We see that this SNP is in Hardy Weinberg equilibrium and allele frequencies similar to previously published in controls but not so in our cases. Our manual inspection of alignments reveals nothing unusual in the cases. We find our other top ranked SNPs to lie in genes known to be connected to brain cancer and cancer in general. Our study here shows a highly discriminative germline SNP for GBM and LGG cancer but requires replication studies to further verify.

Original languageEnglish (US)
Title of host publication2019 3rd International Conference on Intelligent Computing in Data Sciences, ICDS 2019
EditorsPlamen Angelov, Jaouad Boumhidi, Hani Hagras, El Habib Nfaoui, Youness Oubenaalla, Chakir Loqman, Mohammed Mestari, Hajar Mousannif
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728100036
DOIs
StatePublished - Oct 2019
Event3rd International Conference on Intelligent Computing in Data Sciences, ICDS 2019 - Marrakech, Morocco
Duration: Oct 28 2019Oct 30 2019

Publication series

Name2019 3rd International Conference on Intelligent Computing in Data Sciences, ICDS 2019

Conference

Conference3rd International Conference on Intelligent Computing in Data Sciences, ICDS 2019
Country/TerritoryMorocco
CityMarrakech
Period10/28/1910/30/19

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Modeling and Simulation
  • Control and Optimization

Keywords

  • 1000 Genomes
  • GBM
  • LGG
  • Prediction
  • Whole Exome Sequencing

Fingerprint

Dive into the research topics of 'Machine learning based prediction of gliomas with germline mutations obtained from whole exome sequences from TCGA and 1000 Genomes Project'. Together they form a unique fingerprint.

Cite this