Project Details
Description
Project Summary
My lab develops artificial intelligence (AI), machine learning (ML) algorithms, and statistical methods to analyze
various genomic data under different experimental designs. With multidisciplinary training in computer science,
statistics, and biology, my research program focuses on developing the informatics of tomorrow in the context of
pressing biomedical application problems today, in collaboration with my colleagues in the biomedical field. All
methods developed in our lab are implemented into user-friendly, publicly available software packages to
maximize their impact. In the past five years, we have focused on single-cell genomics, transcriptomics, and
epigenomics. Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of cellular heterogeneity
and complex biological systems. Despite advances in computational methods, including our lab's contributions,
the full potential of these datasets remains untapped due to the lack of powerful tools for integrating and
analyzing vast amounts of single-cell omics data. Additionally, emerging biotechnologies like cellular barcoding,
when coupled with single-cell sequencing, necessitate the development of new computational methods to fully
realize their potential. Therefore, our goals for the next five years include: (1) developing and optimizing large-
scale foundation models for single-cell omics data, (2) creating computational methods for barcoding single-cell
omics data, and (3) quantifying cell-type annotation uncertainty in scRNA-seq studies. We will develop innovative
AI/ML techniques to address computational challenges in single-cell large-scale foundation models (scLFMs),
integrate biological domain knowledge, incorporate other modalities and cross-species data, and develop metrics
to evaluate scLFM embedding quality. We will also create methods that leverage barcode information and
biological knowledge for clustering, cell-cell communications, and cell trajectory inference, as well as statistical
methods for detecting clones with longitudinal changes and identifying genes driving these changes. Lastly, we
will use conformal prediction to quantify cell-type annotation uncertainty in scRNA-seq studies. We will develop
a basic testing procedure to produce statistically valid prediction sets for each cell and a tree-based testing
procedure that considers the hierarchical structures of cell types. The proposed research builds upon the PI’s
lab's recent progress in developing deep learning methods for single-cell, epigenomic, and genetic data analysis,
as well as statistical methods for transcriptomic data analysis. We emphasize the importance of implementing
our proposed methods into user-friendly and open-source software tools to benefit the biomedical community.
The overall vision of the research program is to advance the development of computational methods for single-
cell omics data analysis, ultimately accelerating biological discovery and clinical applications.
| Status | Active |
|---|---|
| Effective start/end date | 8/1/25 → 5/31/26 |
Funding
- National Institute of General Medical Sciences: $376,250.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.