TY - JOUR
T1 - An active learning approach for clustering single-cell RNA-seq data
AU - Lin, Xiang
AU - Liu, Haoran
AU - Wei, Zhi
AU - Roy, Senjuti Basu
AU - Gao, Nan
N1 - Publisher Copyright:
© 2021, The Author(s), under exclusive licence to United States and Canadian Academy of Pathology.
PY - 2022/3
Y1 - 2022/3
N2 - Single-cell RNA sequencing (scRNA-seq) data has been widely used to profile cellular heterogeneities with a high-resolution picture. Clustering analysis is a crucial step of scRNA-seq data analysis because it provides a chance to identify and uncover undiscovered cell types. Most methods for clustering scRNA-seq data use an unsupervised learning strategy. Since the clustering step is separated from the cell annotation and labeling step, it is not uncommon for a totally exotic clustering with poor biological interpretability to be generated—a result generally undesired by biologists. To solve this problem, we proposed an active learning (AL) framework for clustering scRNA-seq data. The AL model employed a learning algorithm that can actively query biologists for labels, and this manual labeling is expected to be applied to only a subset of cells. To develop an optimal active learning approach, we explored several key parameters of the AL model in the experiments with four real scRNA-seq datasets. We demonstrate that the proposed AL model outperformed state-of-the-art unsupervised clustering methods with less than 1000 labeled cells. Therefore, we conclude that AL model is a promising tool for clustering scRNA-seq data that allows us to achieve a superior performance effectively and efficiently.
AB - Single-cell RNA sequencing (scRNA-seq) data has been widely used to profile cellular heterogeneities with a high-resolution picture. Clustering analysis is a crucial step of scRNA-seq data analysis because it provides a chance to identify and uncover undiscovered cell types. Most methods for clustering scRNA-seq data use an unsupervised learning strategy. Since the clustering step is separated from the cell annotation and labeling step, it is not uncommon for a totally exotic clustering with poor biological interpretability to be generated—a result generally undesired by biologists. To solve this problem, we proposed an active learning (AL) framework for clustering scRNA-seq data. The AL model employed a learning algorithm that can actively query biologists for labels, and this manual labeling is expected to be applied to only a subset of cells. To develop an optimal active learning approach, we explored several key parameters of the AL model in the experiments with four real scRNA-seq datasets. We demonstrate that the proposed AL model outperformed state-of-the-art unsupervised clustering methods with less than 1000 labeled cells. Therefore, we conclude that AL model is a promising tool for clustering scRNA-seq data that allows us to achieve a superior performance effectively and efficiently.
UR - http://www.scopus.com/inward/record.url?scp=85109957236&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85109957236&partnerID=8YFLogxK
U2 - 10.1038/s41374-021-00639-w
DO - 10.1038/s41374-021-00639-w
M3 - Article
C2 - 34244616
AN - SCOPUS:85109957236
SN - 0023-6837
VL - 102
SP - 227
EP - 235
JO - Laboratory Investigation
JF - Laboratory Investigation
IS - 3
ER -