TY - JOUR
T1 - Elucidation of DNA methylation on N 6-adenine with deep learning
AU - Tan, Fei
AU - Tian, Tian
AU - Hou, Xiurui
AU - Yu, Xiang
AU - Gu, Lei
AU - Mafra, Fernanda
AU - Gregory, Brian D.
AU - Wei, Zhi
AU - Hakonarson, Hakon
N1 - Funding Information:
We thank H. Liu for the partial data preprocessing. This study was supported by The Children’s Hospital of Philadelphia Endowed Chair in Genomic Research to H.H. and an Institutional Development Award to the Center for Applied Genomics from The Children’s Hospital of Philadelphia. This work was supported by Extreme Science and Engineering Discovery Environment (XSEDE) through allocation CIE160021 and CIE170034 (supported by National Science Foundation grant no. ACI-1548562).
Publisher Copyright:
© 2020, The Author(s), under exclusive licence to Springer Nature Limited.
PY - 2020/8/1
Y1 - 2020/8/1
N2 - Research on DNA methylation on N6-adenine (6mA) in eukaryotes has received much recent attention. Recent studies have generated a large amount of 6mA genomic data, yet the role of DNA 6mA in eukaryotes remains elusive, or even controversial. We argue that the sparsity of DNA 6mA in eukaryotes, the limitations of current biotechnologies for 6mA detection and the sophistication of the 6mA regulatory mechanism together pose great challenges for elucidation of DNA 6mA. To exploit existing 6mA genomic data and address this challenge, here we develop a deep-learning-based algorithm for predicting potential DNA 6mA sites de novo from sequence at single-nucleotide resolution, with application to three representative model organisms, Arabidopsis thaliana, Drosophila melanogaster and Escherichia coli. Extensive experiments demonstrate the accuracy of our algorithm and its superior performance compared with conventional k-mer-based approaches. Furthermore, our saliency maps-based context analysis protocol reveals interesting cis-regulatory patterns around the 6mA sites that are missed by conventional motif analysis. Our proposed analytical tools and findings will help to elucidate the regulatory mechanisms of 6mA and benefit the in-depth exploration of their functional effects. Finally, we offer a complete catalogue of potential 6mA sites based on in silico whole-genome prediction.
AB - Research on DNA methylation on N6-adenine (6mA) in eukaryotes has received much recent attention. Recent studies have generated a large amount of 6mA genomic data, yet the role of DNA 6mA in eukaryotes remains elusive, or even controversial. We argue that the sparsity of DNA 6mA in eukaryotes, the limitations of current biotechnologies for 6mA detection and the sophistication of the 6mA regulatory mechanism together pose great challenges for elucidation of DNA 6mA. To exploit existing 6mA genomic data and address this challenge, here we develop a deep-learning-based algorithm for predicting potential DNA 6mA sites de novo from sequence at single-nucleotide resolution, with application to three representative model organisms, Arabidopsis thaliana, Drosophila melanogaster and Escherichia coli. Extensive experiments demonstrate the accuracy of our algorithm and its superior performance compared with conventional k-mer-based approaches. Furthermore, our saliency maps-based context analysis protocol reveals interesting cis-regulatory patterns around the 6mA sites that are missed by conventional motif analysis. Our proposed analytical tools and findings will help to elucidate the regulatory mechanisms of 6mA and benefit the in-depth exploration of their functional effects. Finally, we offer a complete catalogue of potential 6mA sites based on in silico whole-genome prediction.
UR - http://www.scopus.com/inward/record.url?scp=85088866093&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85088866093&partnerID=8YFLogxK
U2 - 10.1038/s42256-020-0211-4
DO - 10.1038/s42256-020-0211-4
M3 - Article
AN - SCOPUS:85088866093
SN - 2522-5839
VL - 2
SP - 466
EP - 475
JO - Nature Machine Intelligence
JF - Nature Machine Intelligence
IS - 8
ER -