"Statistical Methods for Identifying Gene Expression and Epigenetic Sig" by Hongyu Li

Statistical Methods for Identifying Gene Expression and Epigenetic Signatures in Complex Traits

Date of Award

Spring 2023

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Public Health

First Advisor

Zhao, Hongyu

Abstract

Complex traits are influenced by multiple genetic and environmental factors, and complex interactions among these factors. Studies on complex traits typically involve large sample sizes and sophisticated statistical analyses to identify genetic and environmental factors that contribute to their variability. Delineating the genetic and environmental factors of complex traits can help us better understand the causes of disease and develop more effective interventions and treatments. In this dissertation, I have developed novel statistical methods that can better identify disease-related gene-expression and epigenetic signatures using single-cell RNA-sequencing and bisulfite-sequencing DNA methylation data. In Chapter 1, we propose to borrow information through known biological networks to increase statistical power to identify differentially expressed genes for single-cell data. We develop MRFscRNAseq, a statistical framework that is based on a Markov Random Field model, to appropriately accommodate gene network information as well as dependencies among cell types to identify cell-type-specific differentially expressed genes. We implement an efficient Expectation-Maximization algorithm with mean field-like approximation to estimate model parameters and a Gibbs sampler to infer differential expression status. Simulation study shows that our method has better power than existing methods while appropriately controlling type I error rate. The usefulness of our method is demonstrated through its application to study the pathogenesis and biological processes of idiopathic pulmonary fibrosis using a single-cell RNA-sequencing dataset. In Chapter 2, we propose a novel statistical framework that can incorporate network information among different brain regions to classify each CpG site into different methylation patterns. Utilizing brain cross-region modeling, we identified 6,641 candidate genes (at 17,274 locations) across the regions using the PTSD case-control methylation dataset from 61 PTSD donors and 53 control donors. Our results implicate DNAm as an epigenetic mechanism underlying the molecular changes associated within the subcortical fear circuitry of the brain of patients with PTSD. We also provide a comprehensive, multi-regional resource of DNA methylation variation in PTSD donors and its effect on downstream biological processes that could be leveraged to identify new interventions for PTSD therapy. In Chapter 3, we propose a sparse group LASSO-based framework, named GLEAM, for enrichment analysis tailored for DNA methylation data to better interpret and understand the biological meaning of CpG sites discovered from differential methylation analysis. Extensive simulation analyses have demonstrated that our method is robust and powerful compared to the existing methods. The applications of GLEAM to the ROSMAP Alzheimer's disease dataset and the UCLA autism spectrum disorder dataset reveal biologically meaningful results that can help to better understand disease pathology and develop effective treatment.

This document is currently not available here.

Share

COinS