"Statistical Methods in High-Throughput Biomedical Data with Applicatio" by Yunqing Liu

Date of Award

Fall 2023

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Public Health

First Advisor

Wang, Zuoheng

Abstract

Precision medicine has emerged as a critical approach in healthcare due to its potential to revolutionize patient care. Traditional medical practices often adopt a one-size-fits-all approach, treating patients based on generalized guidelines. However, individuals exhibit significant variations in their genetic makeup, lifestyle factors, and environmental exposures. Precision medicine recognizes and leverages these differences to deliver personalized treatment plans tailored to each patient's unique characteristics. Recent technological advancements have facilitated the generation of high-throughput data, such as single-cell RNA sequencing (scRNA-seq) data, spatial transcriptomics (ST) data, locomotor activity assays measuring behavior, which offers the potential to the development of precision medicine. However, the analyses of such types of data presents unique challenges in statistical analysis. In this dissertation, I present the development of novel statistical methods for disease-related biomedical research. The proposed approaches aim to contribute to the advancement of precision medicine by enabling more accurate identification of gene target, improved deconvolution of cell type compositions, and the exploration of potential therapeutic drugs for specific subtypes of diseases.The first chapter will give a comprehensive introduction of the background. In Chapter 2, we focus on cell type-specific differential expression analysis in scRNA-seq data, which allows the identification of cell type-specific genes as potential biomarkers and drug targets for personalized medicine. Here we introduce iDESC, a statistical method for detecting cell type-specific disease-related differentially expressed (DE) genes in high-throughput scRNA-seq data. iDESC employs a zero-inflated negative binomial mixed model, which takes into account subject effects and dropout events. We evaluate iDESC alongside eleven existing DE analysis methods using both simulated and real scRNA-seq datasets. Our findings demonstrate that iDESC outperforms existing methods, providing more accurate and robust DE analysis results. By separating subject effects from disease effects and considering dropouts, iDESC effectively identifies disease-related DE genes, emphasizing the significance of these factors in scRNA-seq data analysis. Chapter 3 discusses how cell type deconvolution in spatial transcriptomics data helps identify specific cell populations within the tissue microenvironment, allowing personalized treatments like immunotherapies to target the patient's unique cellular landscape. We present SDePER, a hybrid machine learning and regression method designed to deconvolve spatial transcriptomics data using reference scRNA-seq data from the same tissue type. SDePER effectively addresses platform effects and incorporates sparsity and spatial correlation in cell type compositions. We evaluate the performance of SDePER and 6 existing methods using simulations and real datasets, demonstrating its superior accuracy and robustness. These results highlight the importance of considering platform effects, sparsity, and spatial correlation in cell type deconvolution, enabling enhanced resolution in estimating cell type proportions and gene expression. In Chapter 4, we studied drug repurposing for autism spectrum disorder (ASD) utilizing high-throughput behavior assays using zebrafish. We explore the characterization of autism spectrum disorder (ASD) subgroups and develop a novel drug repurposing strategy, Z-rescue, to identify potential therapeutic drugs for specific ASD subtypes associated with risk genes DYRK1A and SCN1A/SCN2A. Leveraging zebrafish as a model organism and combining genetic insights with behavioral analysis, we aim to contribute to the understanding of ASDs and related disorders, as well as the discovery of effective treatments. This drug repurposing strategy can be extended to analyze various type of biological datasets, such as gene expression data, from other species, thus enabling the identification of new therapeutic uses for existing drugs and accelerating precision medicine development.

Share

COinS