TY - JOUR
T1 - DeepCNV
T2 - A deep learning approach for authenticating copy number variations
AU - Glessner, Joseph T.
AU - Hou, Xiurui
AU - Zhong, Cheng
AU - Zhang, Jie
AU - Khan, Munir
AU - Brand, Fabian
AU - Krawitz, Peter
AU - Sleiman, Patrick M.A.
AU - Hakonarson, Hakon
AU - Wei, Zhi
N1 - Publisher Copyright:
© 2021 The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected].
PY - 2021/9/1
Y1 - 2021/9/1
N2 - Copy number variations (CNVs) are an important class of variations contributing to the pathogenesis of many disease phenotypes. Detecting CNVs from genomic data remains difficult, and the most currently applied methods suffer from an unacceptably high false positive rate. A common practice is to have human experts manually review original CNV calls for filtering false positives before further downstream analysis or experimental validation. Here, we propose DeepCNV, a deep learning-based tool, intended to replace human experts when validating CNV calls, focusing on the calls made by one of the most accurate CNV callers, PennCNV. The sophistication of the deep neural network algorithm is enriched with over 10 000 expert-scored samples that are split into training and testing sets. Variant confidence, especially for CNVs, is a main roadblock impeding the progress of linking CNVs with the disease. We show that DeepCNV adds to the confidence of the CNV calls with an optimal area under the receiver operating characteristic curve of 0.909, exceeding other machine learning methods. The superiority of DeepCNV was also benchmarked and confirmed using an experimental wet-lab validation dataset. We conclude that the improvement obtained by DeepCNV results in significantly fewer false positive results and failures to replicate the CNV association results.
AB - Copy number variations (CNVs) are an important class of variations contributing to the pathogenesis of many disease phenotypes. Detecting CNVs from genomic data remains difficult, and the most currently applied methods suffer from an unacceptably high false positive rate. A common practice is to have human experts manually review original CNV calls for filtering false positives before further downstream analysis or experimental validation. Here, we propose DeepCNV, a deep learning-based tool, intended to replace human experts when validating CNV calls, focusing on the calls made by one of the most accurate CNV callers, PennCNV. The sophistication of the deep neural network algorithm is enriched with over 10 000 expert-scored samples that are split into training and testing sets. Variant confidence, especially for CNVs, is a main roadblock impeding the progress of linking CNVs with the disease. We show that DeepCNV adds to the confidence of the CNV calls with an optimal area under the receiver operating characteristic curve of 0.909, exceeding other machine learning methods. The superiority of DeepCNV was also benchmarked and confirmed using an experimental wet-lab validation dataset. We conclude that the improvement obtained by DeepCNV results in significantly fewer false positive results and failures to replicate the CNV association results.
KW - copy number variation
KW - deep learning
UR - http://www.scopus.com/inward/record.url?scp=85116172783&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85116172783&partnerID=8YFLogxK
U2 - 10.1093/bib/bbaa381
DO - 10.1093/bib/bbaa381
M3 - Review article
C2 - 33429424
AN - SCOPUS:85116172783
SN - 1467-5463
VL - 22
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 5
M1 - bbaa381
ER -