"Statistical Methods for Modelling Structures in High-dimensional Data " by Chang Su

Date of Award

Spring 2023

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Statistics and Data Science

First Advisor

Zhao, Hongyu

Abstract

With the advancements in genotyping arrays and high-throughput sequencing technologies, the field of genetics and genomics has witnessed a surge in the collection and analysis of high-dimensional data in recent decades. These data, collected from large cohorts of individuals or tissues with heterogeneous cell types, often encompass complex underlying structures. For instance, major variations in genetic variants capture the ancestry background of profiled individuals, and gene co-expression networks encode the biological functions and molecular mechanisms in tissues and cell types. However, the high-dimensionality and biological complexity of these data present challenges for statistical estimation and inference with traditional methods. To tackle these challenges, I have developed three statistical methods for modelling underlying structures in high-dimensional genetics and genomics data. In Chapter 1, I present a statistical method that extends principal component analysis to the high-dimensional regime using an Empirical Bayes approach, which gives improved estimates of ancestry background for diverse populations using genotype data. In Chapters 2 and 3, I present two statistical methods for estimating cell-type-specific gene co-expression networks using either bulk or single cell RNA sequencing data, which uncover biological pathways in specific cell types and offer more refined insights compared to existing tissue-specific co-expression networks. Taken together, these three new statistical methods offer more accurate and precise insights into the underlying structures of high-dimensional genetics and genomics data.

Share

COinS