TY - JOUR
T1 - From disease association to risk assessment
T2 - An optimistic view from genome-wide association studies on type 1 diabetes
AU - Wei, Zhi
AU - Wang, Kai
AU - Qu, Hui Qi
AU - Zhang, Haitao
AU - Bradfield, Jonathan
AU - Kim, Cecilia
AU - Frackleton, Edward
AU - Hou, Cuiping
AU - Glessner, Joseph T.
AU - Chiavacci, Rosetta
AU - Stanley, Charles
AU - Monos, Dimitri
AU - Grant, Struan F.A.
AU - Polychronakos, Constantin
AU - Hakonarson, Hakon
N1 - Funding Information:
We gratefully thank all the children with type 1 diabetes and their families who were enrolled in this study, as well as all the control subjects who donated blood samples to Children's Hospital of Philadelphia (CHOP) for genetic research purposes. We thank the technical staff at the Center for Applied Genomics (CAG) at CHOP for producing the genotypes used for analyses and the nursing, medical assistant, and medical staff for their invaluable help with recruitment of patient and control subjects for the study. We thank the Wellcome Trust Case Control Consortium (WTCCC) for making the Affymetrix data available for our analysis. The WTCCC is funded by Wellcome Trust award 076113, and a full list of the investigators who contributed to the generation of the data are available from http://www.wtccc.org.uk . We gratefully acknowledge the Genetics of Kidneys in Diabetes (GoKinD) study for the use of their SNP genotype data, which was obtained from the Genetic Association Information Network (GAIN) database (dbGAP, phs000018.v1.p1). The GoKinD study was conducted by the GoKinD Investigators and supported by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). The content and conclusions presented in this manuscript do not necessarily reflect the opinions or views of the GoKind study or the NIDDK.
PY - 2009/10
Y1 - 2009/10
N2 - Genome-wide association studies (GWAS) have been fruitful in identifying disease susceptibility loci for common and complex diseases. A remaining question is whether we can quantify individual disease risk based on genotype data, in order to facilitate personalized prevention and treatment for complex diseases. Previous studies have typically failed to achieve satisfactory performance, primarily due to the use of only a limited number of confirmed susceptibility loci. Here we propose that sophisticated machine-learning approaches with a large ensemble of markers may improve the performance of disease risk assessment. We applied a Support Vector Machine (SVM) algorithm on a GWAS dataset generated on the Affymetrix genotyping platform for type 1 diabetes (T1D) and optimized a risk assessment model with hundreds of markers. We subsequently tested this model on an independent Illumina-genotyped dataset with imputed genotypes (1,008 cases and 1,000 controls), as well as a separate Affymetrix-genotyped dataset (1,529 cases and 1,458 controls), resulting in area under ROC curve (AUC) of -0.84 in both datasets. In contrast, poor performance was achieved when limited to dozens of known susceptibility loci in the SVM model or logistic regression model. Our study suggests that improved disease risk assessment can be achieved by using algorithms that take into account interactions between a large ensemble of markers. We are optimistic that genotype-based disease risk assessment may be feasible for diseases where a notable proportion of the risk has already been captured by SNP arrays.
AB - Genome-wide association studies (GWAS) have been fruitful in identifying disease susceptibility loci for common and complex diseases. A remaining question is whether we can quantify individual disease risk based on genotype data, in order to facilitate personalized prevention and treatment for complex diseases. Previous studies have typically failed to achieve satisfactory performance, primarily due to the use of only a limited number of confirmed susceptibility loci. Here we propose that sophisticated machine-learning approaches with a large ensemble of markers may improve the performance of disease risk assessment. We applied a Support Vector Machine (SVM) algorithm on a GWAS dataset generated on the Affymetrix genotyping platform for type 1 diabetes (T1D) and optimized a risk assessment model with hundreds of markers. We subsequently tested this model on an independent Illumina-genotyped dataset with imputed genotypes (1,008 cases and 1,000 controls), as well as a separate Affymetrix-genotyped dataset (1,529 cases and 1,458 controls), resulting in area under ROC curve (AUC) of -0.84 in both datasets. In contrast, poor performance was achieved when limited to dozens of known susceptibility loci in the SVM model or logistic regression model. Our study suggests that improved disease risk assessment can be achieved by using algorithms that take into account interactions between a large ensemble of markers. We are optimistic that genotype-based disease risk assessment may be feasible for diseases where a notable proportion of the risk has already been captured by SNP arrays.
UR - http://www.scopus.com/inward/record.url?scp=73449129712&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=73449129712&partnerID=8YFLogxK
U2 - 10.1371/journal.pgen.1000678
DO - 10.1371/journal.pgen.1000678
M3 - Article
C2 - 19816555
AN - SCOPUS:73449129712
SN - 1553-7390
VL - 5
JO - PLoS Genetics
JF - PLoS Genetics
IS - 10
M1 - e1000678
ER -