TY - JOUR
T1 - Clustering single-cell RNA-seq data with a model-based deep learning approach
AU - Tian, Tian
AU - Wan, Ji
AU - Song, Qi
AU - Wei, Zhi
N1 - Funding Information:
This study was conducted when Darlene DeMarie-Dreblow was employed by Muskingum College. The authors thank: John Swank, a Columbus zoo expert, who helped to determine the content of children's photographs; Virginia McDonough and Gail Jackson, who assisted with data collection; Danyell Miller Rager, who assisted with data coding. These individuals were all undergraduate students of the first author at the time of the study. The authors also thank Director, Peggy Murphy; teachers, Marcia Dunlap and Tammy Hannan; and the other teachers, parents, and children of the Muskingum College Center for Child Development for their cooperation. This research was supported by a grant from the Ohio Association for the Education of Young Children. The authors thank Jeff Farrar, Pat Miller, Stacey Mulrenin, and two anonymous reviewers for comments on early drafts of this paper, and John Ferron for assistance with data analysis. Portions of this paper were presented at the 1994 and 2000 annual conferences of the American Psychological Society in Washington, DC and Miami, FL, and the 1995 conference of the Midwestern Psychological Association in Chicago.
Publisher Copyright:
© 2019, The Author(s), under exclusive licence to Springer Nature Limited.
PY - 2019/4/1
Y1 - 2019/4/1
N2 - Single-cell RNA sequencing (scRNA-seq) promises to provide higher resolution of cellular differences than bulk RNA sequencing. Clustering transcriptomes profiled by scRNA-seq has been routinely conducted to reveal cell heterogeneity and diversity. However, clustering analysis of scRNA-seq data remains a statistical and computational challenge, due to the pervasive dropout events obscuring the data matrix with prevailing ‘false’ zero count observations. Here, we have developed scDeepCluster, a single-cell model-based deep embedded clustering method, which simultaneously learns feature representation and clustering via explicit modelling of scRNA-seq data generation. Based on testing extensive simulated data and real datasets from four representative single-cell sequencing platforms, scDeepCluster outperformed state-of-the-art methods under various clustering performance metrics and exhibited improved scalability, with running time increasing linearly with sample size. Its accuracy and efficiency make scDeepCluster a promising algorithm for clustering large-scale scRNA-seq data.
AB - Single-cell RNA sequencing (scRNA-seq) promises to provide higher resolution of cellular differences than bulk RNA sequencing. Clustering transcriptomes profiled by scRNA-seq has been routinely conducted to reveal cell heterogeneity and diversity. However, clustering analysis of scRNA-seq data remains a statistical and computational challenge, due to the pervasive dropout events obscuring the data matrix with prevailing ‘false’ zero count observations. Here, we have developed scDeepCluster, a single-cell model-based deep embedded clustering method, which simultaneously learns feature representation and clustering via explicit modelling of scRNA-seq data generation. Based on testing extensive simulated data and real datasets from four representative single-cell sequencing platforms, scDeepCluster outperformed state-of-the-art methods under various clustering performance metrics and exhibited improved scalability, with running time increasing linearly with sample size. Its accuracy and efficiency make scDeepCluster a promising algorithm for clustering large-scale scRNA-seq data.
UR - http://www.scopus.com/inward/record.url?scp=85073831292&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85073831292&partnerID=8YFLogxK
U2 - 10.1038/s42256-019-0037-0
DO - 10.1038/s42256-019-0037-0
M3 - Article
AN - SCOPUS:85073831292
SN - 2522-5839
VL - 1
SP - 191
EP - 198
JO - Nature Machine Intelligence
JF - Nature Machine Intelligence
IS - 4
ER -