TY - JOUR
T1 - Optimized homomorphic encryption solution for secure genome-wide association studies
AU - Blatt, Marcelo
AU - Gusev, Alexander
AU - Polyakov, Yuriy
AU - Rohloff, Kurt
AU - Vaikuntanathan, Vinod
N1 - Publisher Copyright:
© 2020 The Author(s).
PY - 2020/7/21
Y1 - 2020/7/21
N2 - Background: Genome-Wide Association Studies (GWAS) refer to observational studies of a genome-wide set of genetic variants across many individuals to see if any genetic variants are associated with a certain trait. A typical GWAS analysis of a disease phenotype involves iterative logistic regression of a case/control phenotype on a single-neuclotide polymorphism (SNP) with quantitative covariates. GWAS have been a highly successful approach for identifying genetic-variant associations with many poorly-understood diseases. However, a major limitation of GWAS is the dependence on individual-level genotype/phenotype data and the corresponding privacy concerns. Methods: We present a solution for secure GWAS using homomorphic encryption (HE) that keeps all individual data encrypted throughout the association study. Our solution is based on an optimized semi-parallel GWAS compute model, a new Residue-Number-System (RNS) variant of the Cheon-Kim-Kim-Song (CKKS) HE scheme, novel techniques to switch between data encodings, and more than a dozen crypto-engineering optimizations. Results: Our prototype can perform the full GWAS computation for 1,000 individuals, 131,071 SNPs, and 3 covariates in about 10 minutes on a modern server computing node (with 28 cores). Our solution for a smaller dataset was awarded co-first place in iDASH'18 Track 2: "Secure Parallel Genome Wide Association Studies using HE". Conclusions: Many of the HE optimizations presented in our paper are general-purpose, and can be used in solving challenging problems with large datasets in other application domains.
AB - Background: Genome-Wide Association Studies (GWAS) refer to observational studies of a genome-wide set of genetic variants across many individuals to see if any genetic variants are associated with a certain trait. A typical GWAS analysis of a disease phenotype involves iterative logistic regression of a case/control phenotype on a single-neuclotide polymorphism (SNP) with quantitative covariates. GWAS have been a highly successful approach for identifying genetic-variant associations with many poorly-understood diseases. However, a major limitation of GWAS is the dependence on individual-level genotype/phenotype data and the corresponding privacy concerns. Methods: We present a solution for secure GWAS using homomorphic encryption (HE) that keeps all individual data encrypted throughout the association study. Our solution is based on an optimized semi-parallel GWAS compute model, a new Residue-Number-System (RNS) variant of the Cheon-Kim-Kim-Song (CKKS) HE scheme, novel techniques to switch between data encodings, and more than a dozen crypto-engineering optimizations. Results: Our prototype can perform the full GWAS computation for 1,000 individuals, 131,071 SNPs, and 3 covariates in about 10 minutes on a modern server computing node (with 28 cores). Our solution for a smaller dataset was awarded co-first place in iDASH'18 Track 2: "Secure Parallel Genome Wide Association Studies using HE". Conclusions: Many of the HE optimizations presented in our paper are general-purpose, and can be used in solving challenging problems with large datasets in other application domains.
KW - Cryptography
KW - Genome-wide association studies
KW - Homomorphic encryption
UR - http://www.scopus.com/inward/record.url?scp=85088513339&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85088513339&partnerID=8YFLogxK
U2 - 10.1186/s12920-020-0719-9
DO - 10.1186/s12920-020-0719-9
M3 - Article
C2 - 32693805
AN - SCOPUS:85088513339
SN - 1755-8794
VL - 13
JO - BMC Medical Genomics
JF - BMC Medical Genomics
M1 - 83
ER -