Date of Award

Spring 2022

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Public Health

First Advisor

Zhao, Hongyu

Abstract

Over the past fifteen years, genome-wide association studies (GWAS) have identified tens of thousands of single-nucleotide polymorphisms (SNPs) associated with complex human traits and diseases. Besides the success in finding risk loci, the estimation of genetic covariance based on collected GWAS data also provides insights into the genetic basis of complex traits/diseases. Genetic covariance is the covariance of genetic effects contributing to two phenotypes. Methods based on linear mixed model (LMM) in conjunction with the restricted maximum likelihood (REML) algorithm have been developed to estimate these two quantities of significant genetic interests. However, methods based on LMM have only gained middling popularity because they require individual-level genotype and phenotype data, which are usually difficult to obtain owing to policy and privacy concerns. Due to the easy access of GWAS summary statistics and computational efficiency, methods only requiring GWAS summary statistics as input have become more popular than methods utilizing individual-level genotype data. In chapter 1, we present a benchmark study for different summary-statistics-based genetic covariance estimation methods through simulation and real data applications. We focus on two major technical challenges in estimating genetic covariance: marker dependency caused by linkage disequilibrium (LD) and sample overlap between different studies. To improve the interpretability of global genetic covariance, local genetic covariance analysis could reveal heterogenous architecture of etiological sharing between complex traits and is critical for understanding the genetic basis of phenotypic correlations among traits. In chapter 2, we introduce SUPERGNOVA, a statistical framework to estimate local genetic covariance using summary statistics from genome-wide association studies. As a case study to illustrate the power of SUPERGNOVA, we performed in-depth analyses to dissect the shared genetics of ASD and cognitive abilities. Given the biological difference between two sets of genomic regions with opposite correlations between ASD and CP, we concluded that the ‘paradoxical’ genetic covariance could be explained by genetic heterogeneity. With the increasing accessibility of individual-level data from genome wide association studies, it is now common for researchers to have individual-level data of some traits in one specific population. In chapter 3, we introduce GENJI, a method that can estimate within-population or transethnic genetic covariance based on individual-level data for one trait and summary-level data for another trait. Through extensive simulations and analyses of real data on within-population and transethnic genetic correlation estimation, we show that GENJI produces more reliable and efficient estimation than summary data-based methods. This thesis first gives a review of the methods to estimate genetic covariance using summary data (chapter 1). Then, it introduces a new methodology to estimate genetic covariance in different ankles. SUPERGNOVA can estimate local genetic covariance (chapter 2) and GENJI can estimate genetic covariance jointly using individual-level data and summary data (chapter 3). We believe this thesis would be meaningful to the future research on the theory, methodology and applications of genetic covariance in human genetics.

COinS