TY - JOUR
T1 - Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest
AU - Roshan, Usman
AU - Chikkagoudar, Satish
AU - Wei, Zhi
AU - Wang, Kai
AU - Hakonarson, Hakon
N1 - Funding Information:
The CIPRES cluster supported by the National Science Foundation (EF0331654) and the Kong cluster at NJIT; Wellcome Trust under award 076113. Funding for open access charge: U.S. National Science Foundation and U.S. National Institutes of Health.
PY - 2011/5
Y1 - 2011/5
N2 - We study the number of causal variants and associated regions identified by top SNPs in rankings given by the popular 1 df chi-squared statistic, support vector machine (SVM) and the random forest (RF) on simulated and real data. If we apply the SVM and RF to the top 2r chi-square-ranked SNPs, where r is the number of SNPs with P-values within the Bonferroni correction, we find that both improve the ranks of causal variants and associated regions and achieve higher power on simulated data. These improvements, however, as well as stability of the SVM and RF rankings, progressively decrease as the cutoff increases to 5r and 10r. As applications we compare the ranks of previously replicated SNPs in real data, associated regions in type 1 diabetes, as provided by the Type 1 Diabetes Consortium, and disease risk prediction accuracies as given by top ranked SNPs by the three methods. Software and webserver are available at http://svmsnps.njit.edu.
AB - We study the number of causal variants and associated regions identified by top SNPs in rankings given by the popular 1 df chi-squared statistic, support vector machine (SVM) and the random forest (RF) on simulated and real data. If we apply the SVM and RF to the top 2r chi-square-ranked SNPs, where r is the number of SNPs with P-values within the Bonferroni correction, we find that both improve the ranks of causal variants and associated regions and achieve higher power on simulated data. These improvements, however, as well as stability of the SVM and RF rankings, progressively decrease as the cutoff increases to 5r and 10r. As applications we compare the ranks of previously replicated SNPs in real data, associated regions in type 1 diabetes, as provided by the Type 1 Diabetes Consortium, and disease risk prediction accuracies as given by top ranked SNPs by the three methods. Software and webserver are available at http://svmsnps.njit.edu.
UR - http://www.scopus.com/inward/record.url?scp=79955984463&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79955984463&partnerID=8YFLogxK
U2 - 10.1093/nar/gkr064
DO - 10.1093/nar/gkr064
M3 - Article
C2 - 21317188
AN - SCOPUS:79955984463
SN - 0305-1048
VL - 39
SP - e62
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 9
ER -