TY - JOUR
T1 - Model-based deep embedding for constrained clustering analysis of single cell RNA-seq data
AU - Tian, Tian
AU - Zhang, Jie
AU - Lin, Xiang
AU - Wei, Zhi
AU - Hakonarson, Hakon
N1 - Funding Information:
This work was supported by Extreme Science and Engineering Discovery Environment (XSEDE) through allocation CIE160021 and CIE170034 (supported by National Science Foundation Grant No. ACI-1548562). We thank Akaash Patel for proofreading the manuscript which improved the clarity of the paper.
Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12/1
Y1 - 2021/12/1
N2 - Clustering is a critical step in single cell-based studies. Most existing methods support unsupervised clustering without the a priori exploitation of any domain knowledge. When confronted by the high dimensionality and pervasive dropout events of scRNA-Seq data, purely unsupervised clustering methods may not produce biologically interpretable clusters, which complicates cell type assignment. In such cases, the only recourse is for the user to manually and repeatedly tweak clustering parameters until acceptable clusters are found. Consequently, the path to obtaining biologically meaningful clusters can be ad hoc and laborious. Here we report a principled clustering method named scDCC, that integrates domain knowledge into the clustering step. Experiments on various scRNA-seq datasets from thousands to tens of thousands of cells show that scDCC can significantly improve clustering performance, facilitating the interpretability of clusters and downstream analyses, such as cell type assignment.
AB - Clustering is a critical step in single cell-based studies. Most existing methods support unsupervised clustering without the a priori exploitation of any domain knowledge. When confronted by the high dimensionality and pervasive dropout events of scRNA-Seq data, purely unsupervised clustering methods may not produce biologically interpretable clusters, which complicates cell type assignment. In such cases, the only recourse is for the user to manually and repeatedly tweak clustering parameters until acceptable clusters are found. Consequently, the path to obtaining biologically meaningful clusters can be ad hoc and laborious. Here we report a principled clustering method named scDCC, that integrates domain knowledge into the clustering step. Experiments on various scRNA-seq datasets from thousands to tens of thousands of cells show that scDCC can significantly improve clustering performance, facilitating the interpretability of clusters and downstream analyses, such as cell type assignment.
UR - http://www.scopus.com/inward/record.url?scp=85103409311&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85103409311&partnerID=8YFLogxK
U2 - 10.1038/s41467-021-22008-3
DO - 10.1038/s41467-021-22008-3
M3 - Article
C2 - 33767149
AN - SCOPUS:85103409311
SN - 2041-1723
VL - 12
JO - Nature communications
JF - Nature communications
IS - 1
M1 - 1873
ER -