TY - JOUR
T1 - Performance of model-based multifactor dimensionality reduction methods for epistasis detection by controlling population structure
AU - Abegaz, Fentaw
AU - Van Lishout, François
AU - Mahachie John, Jestinah M.
AU - Chiachoompu, Kridsadakorn
AU - Bhardwaj, Archana
AU - Duroux, Diane
AU - Gusareva, Elena S.
AU - Wei, Zhi
AU - Hakonarson, Hakon
AU - Van Steen, Kristel
N1 - Funding Information:
This research was in part funded by the Fonds de la Recherche Scientifique (F.N.R.S.), in particular, “Integrated complex traits epistasis kit” (Convention n° 2.4609.11) [KVS]. We also acknowledge research opportunities offered by F.N.R.S., including “Foresting in Integromics Inference” (Convention n° T.0180.13) [KC], and by the interuniversity research institute Walloon Excellence in Lifesciences and BIOtechnology (WELBIO) [FA, KVS].
Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12
Y1 - 2021/12
N2 - Background: In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. Methods: To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC. Results: Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power. Conclusion: We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity.
AB - Background: In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. Methods: To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC. Results: Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power. Conclusion: We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity.
KW - Confounding
KW - Epistasis
KW - GWAIS
KW - GWAS
KW - Gene-gene interaction
KW - MB-MDR
KW - Population stratification
KW - Population structure
KW - Principal components
UR - http://www.scopus.com/inward/record.url?scp=85101207343&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85101207343&partnerID=8YFLogxK
U2 - 10.1186/s13040-021-00247-w
DO - 10.1186/s13040-021-00247-w
M3 - Article
AN - SCOPUS:85101207343
SN - 1756-0381
VL - 14
JO - BioData Mining
JF - BioData Mining
IS - 1
M1 - 16
ER -