Model-based autoencoders for imputing discrete single-cell RNA-seq data

Tian Tian, Martin Renqiang Min, Zhi Wei

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

Deep neural networks have been widely applied for missing data imputation. However, most existing studies have been focused on imputing continuous data, while discrete data imputation is under-explored. Discrete data is common in real world, especially in research areas of bioinformatics, genetics, and biochemistry. In particular, large amounts of recent genomic data are discrete count data generated from single-cell RNA sequencing (scRNA-seq) technology. Most scRNA-seq studies produce a discrete matrix with prevailing ‘false’ zero count observations (missing values). To make downstream analyses more effective, imputation, which recovers the missing values, is often conducted as the first step in pre-processing scRNA-seq data. In this paper, we propose a novel Zero-Inflated Negative Binomial (ZINB) model-based autoencoder for imputing discrete scRNA-seq data. The novelties of our method are twofold. First, in addition to optimizing the ZINB likelihood, we propose to explicitly model the dropout events that cause missing values by using the Gumbel-Softmax distribution. Second, the zero-inflated reconstruction is further optimized with respect to the raw count matrix. Extensive experiments on simulation datasets demonstrate that the zero-inflated reconstruction significantly improves imputation accuracy. Real data experiments show that the proposed imputation can enhance separating different cell types and improve the accuracy of differential expression analysis.

Original languageEnglish (US)
Pages (from-to)112-119
Number of pages8
JournalMethods
Volume192
DOIs
StatePublished - Aug 2021
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Molecular Biology
  • General Biochemistry, Genetics and Molecular Biology

Keywords

  • Deep learning
  • Imputation
  • scRNA-seq

Fingerprint

Dive into the research topics of 'Model-based autoencoders for imputing discrete single-cell RNA-seq data'. Together they form a unique fingerprint.

Cite this