Date of Award
Spring 2024
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Computational Biology and Bioinformatics
First Advisor
Kluger, Yuval
Abstract
Single-cell sequencing techniques have revolutionized our understanding of cellular diversity within tissues and organs. Typically, single-cell data is preprocessed into a feature-by-cell count matrix, where feature values represent gene expression, chromatin accessibility, protein levels, etc. In the realm of computational analysis, cells are conceptualized as points in a high-dimensional feature space. Under distinct conditions or due to experimental perturbations, specific cellular states may exhibit differential abundance. This can be detected by comparing cell density distributions within the feature space. In Chapter 1, we present a novel computational framework by employing a random-walk-based local two-sample test. This approach enables multiscale, cluster-free, differential cell abundance analysis with rigorous statistical guarantees. Through applications to real-world datasets, our approach captures meaningful variations in cell abundance between different biological conditions and provides new biological insights.Beyond comparing cellular profiles across various datasets, each dataset itself often encapsulates a variety of cellular states that underlie dynamic processes such as cell cycle and tissue/organ differentiation. Current cell trajectory inference approaches use single-cell whole-transcriptome data to organize cells into lineages and assign pseudotime to them. However, many complex biological processes are orchestrated by multiple gene programs, some of which co-occur in an intertwined manner (e.g., cell differentiation coupled with cell cycle). If different sets of genes drive multiple processes in the same group of cells, these cells tend to organize into a manifold with an intrinsic dimension greater than 1. Consequently, determining a unidimensional lineage for these cells and constructing a meaningful cell pseudotime becomes impractical. To overcome this limitation, we have developed a new approach, GeneTrajectory, that constructs trajectories of genes rather than trajectories of cells. GeneTrajectory automatically dissects out gene programs from the whole transcriptome, eliminating the need for initial cell trajectory construction or specification of the initial and terminal cell states for each process. This makes it broadly applicable, even to a cell cloud with a non-curvilinear geometric structure. Using this method, genes that sequentially contribute to a given biological process can be extracted and then organized into a gene trajectory. The ordering of genes along each gene trajectory implies the successive order of gene activity during each underlying biological process. By deconvolving co-occurring processes, each process (e.g., lineage differentiation) can be purely represented, excluding irrelevant biological effects from the other processes. We demonstrate the utility and advantages of GeneTrajectory through applications to two real-world biological datasets in Chapter 2. Throughout my Ph.D., we harnessed cutting-edge single-cell analysis methods to explore diverse biological systems. In Chapter 3, we summarize our endeavors to resolve a fast transition process during the early embryonic stage of mouse hair follicle genesis. We integrated comparative analysis tools to dissect out transcriptome differences between multiple pairs of wildtype and mutant samples, unraveling the molecular mechanisms that orchestrate cellular fate during this developmental process.
Recommended Citation
Qu, Rihao, "Deriving Cell and Gene Dynamics from Single-cell Omics Data" (2024). Yale Graduate School of Arts and Sciences Dissertations. 1408.
https://elischolar.library.yale.edu/gsas_dissertations/1408