DeepPolyA: A Convolutional Neural Network Approach for Polyadenylation Site Prediction

Xin Gao, Jie Zhang, Zhi Wei, Hakon Hakonarson

Research output: Contribution to journalArticlepeer-review

36 Scopus citations

Abstract

Polyadenylation (Poly(A)) plays crucial roles in gene regulation, especially in messenger RNA metabolism, protein diversification, and protein localization. Accurate prediction of polyadenylation sites and identification of motifs that controlling polyadenylation are fundamental for interpreting the patterns of gene expression, improving the accuracy of genome annotation and comprehending the mechanisms that governing gene regulation. Despite considerable advances in using machine learning techniques for this problem, its efficiency is still limited by the lack of experiences and domain knowledge to carefully design and generate useful features, especially for plants. With the increasing availability of extensive genomic data sets and leading computational techniques, deep learning methods, especially convolutional neural networks, have been applied to automatically identify and understand gene regulation directly from gene sequences and predict unknown sequence profiles. Here, we present DeepPolyA, a new deep convolutional neural network-based approach, to predict polyadenylation sites from the plant Arabidopsis thaliana gene sequences. We investigate various deep neural network architectures and evaluate their performance against classical machine learning algorithms and several popular deep learning models. Experimental results demonstrate that DeepPolyA is substantially better than competing methods regarding various performance metrics. We further visualize the learned motifs of DeepPolyA to provide insights of our model and learned polyadenylation signals.

Original languageEnglish (US)
Pages (from-to)24340-24349
Number of pages10
JournalIEEE Access
Volume6
DOIs
StatePublished - Apr 11 2018
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • General Materials Science
  • General Engineering

Keywords

  • Polyadenylation prediction
  • deep learning
  • genomics and machine learning algorithms
  • motif discovery
  • multi-layer neural network

Fingerprint

Dive into the research topics of 'DeepPolyA: A Convolutional Neural Network Approach for Polyadenylation Site Prediction'. Together they form a unique fingerprint.

Cite this